{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {},
  "cells": [
    {
      "metadata": {},
      "source": [
        "<td>\n",
        "   <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=256/></a>\n",
        "</td>"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "<td>\n",
        "<a href=\"https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/data_rows.ipynb\" target=\"_blank\"><img\n",
        "src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
        "</td>\n",
        "\n",
        "<td>\n",
        "<a href=\"https://github.com/Labelbox/labelbox-python/tree/master/examples/basics/data_rows.ipynb\" target=\"_blank\"><img\n",
        "src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n",
        "</td>"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Data rows"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "* Data rows are the assets that are being labeled. We currently support the following asset types:\n",
        "    * Image\n",
        "    * Text\n",
        "    * Video\n",
        "    * Geospatial / Tiled Imagery\n",
        "    * Audio\n",
        "    * Documents \n",
        "    * HTML \n",
        "    * DICOM \n",
        "    * Conversational\n",
        "* A data row cannot exist without belonging to a dataset.\n",
        "* Data rows are added to labeling tasks by first attaching them to datasets and then creating batches in projects"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "!pip install labelbox -q"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "import labelbox as lb\n",
        "import uuid\n",
        "import json"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "# API Key and Client\n",
        "Provide a valid api key below in order to properly connect to the Labelbox Client."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Add your api key\n",
        "API_KEY = \"\"\n",
        "client = lb.Client(api_key=API_KEY)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Get data rows from projects"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Pick a project with batches that have data rows with global keys\n",
        "PROJECT_ID = \"<PROJECT-ID>\"\n",
        "project = client.get_project(PROJECT_ID)\n",
        "batches = list(project.batches())\n",
        "print(batches)\n",
        "# This is the same as\n",
        "# -> dataset = client.get_dataset(dataset_id)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Fetch data rows from project's batches\n",
        "\n",
        "Batches will need to be exported from your project as a export parameter. Before you can export from a project you will need an ontology attached."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "client.enable_experimental = True\n",
        "\n",
        "batch_ids = [batch.uid for batch in batches]\n",
        "\n",
        "export_params = {\n",
        " \"attachments\": True,\n",
        "  \"metadata_fields\": True,\n",
        "  \"data_row_details\": True,\n",
        "  \"project_details\": True,\n",
        "  \"performance_details\": True,\n",
        "  \"batch_ids\" : batch_ids # Include batch ids if you only want to export specific batches, otherwise,\n",
        "  #you can export all the data without using this parameter\n",
        "}\n",
        "filters = {}\n",
        "\n",
        "# A task is returned, this provides additional information about the status of your task, such as\n",
        "# any errors encountered\n",
        "export_task = project.export(params=export_params, filters=filters)\n",
        "export_task.wait_till_done()"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "data_rows = []\n",
        "\n",
        "def json_stream_handler(output: lb.JsonConverterOutput):\n",
        "  data_row = json.loads(output.json_str)\n",
        "  data_rows.append(data_row)\n",
        "\n",
        "\n",
        "if export_task.has_errors():\n",
        "  export_task.get_stream(\n",
        "  converter=lb.JsonConverter(),\n",
        "  stream_type=lb.StreamType.ERRORS\n",
        "  ).start(stream_handler=lambda error: print(error))\n",
        "\n",
        "if export_task.has_result():\n",
        "  export_json = export_task.get_stream(\n",
        "    converter=lb.JsonConverter(),\n",
        "    stream_type=lb.StreamType.RESULT\n",
        "  ).start(stream_handler=json_stream_handler)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "# Get single data row\n",
        "data_row = data_rows[0]\n",
        "print(data_row)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Get labels from the data row"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "print(\"Associated label(s)\", data_row[\"projects\"][project.uid][\"labels\"])\n",
        "print(\"Global key\", data_row[\"data_row\"][\"global_key\"])"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Get data row ids by using global keys"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "global_key = \"<ENTER GLOBAL KEY>\"\n",
        "task = client.get_data_row_ids_for_global_keys([global_key])\n",
        "print(f\"Data row id: {task['results']}\")"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Create\n",
        "* Create a single data row with and without metadata"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "dataset = client.create_dataset(name=\"data_rows_demo_dataset\")\n",
        "\n",
        "# It is recommended that you add global keys to your data rows.\n",
        "dataset.create_data_row(row_data=\"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0002.jpeg\",\n",
        "                        global_key=str(uuid.uuid4()))\n",
        "\n",
        "# You can also upload metadata along with your data row\n",
        "mdo = client.get_data_row_metadata_ontology()\n",
        "dataset.create_data_row(row_data=\"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0003.jpeg\",\n",
        "                        global_key=str(uuid.uuid4()),\n",
        "                        metadata_fields=[\n",
        "                            lb.DataRowMetadataField(\n",
        "                              schema_id=mdo.reserved_by_name[\"tag\"].uid,  # specify the schema id\n",
        "                              value=\"tag_string\", # typed inputs\n",
        "                            ),\n",
        "                        ],\n",
        "                      )"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### [Recommended] Bulk create data rows (This is much faster than creating individual data rows)"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Create a dataset\n",
        "dataset = client.create_dataset(name=\"data_rows_demo_dataset_2\")\n",
        "\n",
        "uploads = []\n",
        "# Generate data rows\n",
        "for i in range(1,9):\n",
        "    uploads.append({\n",
        "        \"row_data\":  f\"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_000{i}.jpeg\",\n",
        "        \"global_key\": \"TEST-ID-%id\" % uuid.uuid1(),\n",
        "        ## add metadata (optional)\n",
        "        \"metadata_fields\": [\n",
        "            lb.DataRowMetadataField(\n",
        "                schema_id=mdo.reserved_by_name[\"tag\"].uid,  # specify the schema id\n",
        "                value=\"tag_string\", # typed inputs\n",
        "            ),\n",
        "        ]\n",
        "    })\n",
        "\n",
        "task1 = dataset.create_data_rows(uploads)\n",
        "task1.wait_till_done()\n",
        "print(\"ERRORS: \" , task1.errors)\n",
        "print(\"RESULTS:\" , task1.result)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Create data rows with attachments"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "task2 = dataset.create_data_rows([{\n",
        "    \"row_data\": \"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0009.jpeg\",\n",
        "    \"global_key\": str(uuid.uuid4()),\n",
        "    \"attachments\": [\n",
        "                {\n",
        "                    \"type\": \"IMAGE_OVERLAY\",\n",
        "                    \"value\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/disease_attachment.jpeg\"\n",
        "                },\n",
        "                {\n",
        "                    \"type\": \"RAW_TEXT\",\n",
        "                    \"value\": \"IOWA, Zone 2232, June 2022 [Text string]\"\n",
        "                },\n",
        "                {\n",
        "                    \"type\": \"TEXT_URL\",\n",
        "                    \"value\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt\"\n",
        "                },\n",
        "                {\n",
        "                    \"type\": \"IMAGE\",\n",
        "                    \"value\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/disease_attachment.jpeg\"\n",
        "                },\n",
        "                {\n",
        "                    \"type\": \"VIDEO\",\n",
        "                    \"value\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/drone_video.mp4\"\n",
        "                },\n",
        "                {\n",
        "                    \"type\": \"HTML\",\n",
        "                    \"value\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/windy.html\"\n",
        "                },\n",
        "                {\n",
        "                    \"type\": \"PDF_URL\",\n",
        "                    \"value\": \"https://storage.googleapis.com/labelbox-datasets/arxiv-pdf/data/99-word-token-pdfs/0801.3483.pdf\"\n",
        "                }\n",
        "            ]\n",
        "    }])\n",
        "print(\"ERRORS: \" , task2.errors)\n",
        "print(\"RESULTS:\" , task2.result)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Create data rows using data in your local path"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Local paths\n",
        "local_data_path = \"/tmp/test_data_row.txt\"\n",
        "with open(local_data_path, 'w') as file:\n",
        "    file.write(\"sample data\")\n",
        "\n",
        "task3 = dataset.create_data_rows([local_data_path])\n",
        "print(\"ERRORS: \" , task3.errors)\n",
        "print(\"RESULTS:\" , task3.result)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "# You can mix local files with urls when creating data rows\n",
        "task4 = dataset.create_data_rows([{\n",
        "    \"row_data\": \"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0003.jpeg\",\n",
        "    \"global_key\": str(uuid.uuid4())\n",
        "    }, {\n",
        "    \"row_data\": local_data_path,\n",
        "    \"global_key\": str(uuid.uuid4())\n",
        "    }])\n",
        "print(\"ERRORS: \" , task4.errors)\n",
        "print(\"RESULTS:\" , task4.result)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Update\n",
        "Only two fields can be updated after a data row is created\n",
        "1. Global keys \n",
        "2. Row data\n"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "data_row = client.get_data_row(\"<data_row_id_to_update>\")\n",
        "new_id = str(uuid.uuid4())\n",
        "data_row.update(global_key=new_id, row_data=\"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_0005.jpeg\")\n",
        "print(data_row)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Create a single attachemt on an existing data row"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# You can only create one attachment at the time.\n",
        "data_row.create_attachment(attachment_type=\"RAW_TEXT\",\n",
        "                           attachment_value=\"LABELERS WILL SEE THIS \")"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Delete"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "* Delete a single data row"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "data_row = client.get_data_row(\"<data_row_id_to_delete>\")\n",
        "data_row.delete()"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "* Bulk delete data row objects"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Bulk delete a list of data_rows ( limit: 4K data rows per call)\n",
        "lb.DataRow.bulk_delete(list(dataset.data_rows()))"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    }
  ]
}